AITopics | reinforced cross-modal matching

Collaborating Authors

reinforced cross-modal matching

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning 3D Robotics Perception using Inductive Priors

Irshad, Muhammad Zubair

arXiv.org Artificial IntelligenceMay-30-2024

Recent advances in deep learning have led to a data-centric intelligence i.e. artificially intelligent models unlocking the potential to ingest a large amount of data and be really good at performing digital tasks such as text-to-image generation, machine-human conversation, and image recognition. This thesis covers the topic of learning with structured inductive bias and priors to design approaches and algorithms unlocking the potential of principle-centric intelligence. Prior knowledge (priors for short), often available in terms of past experience as well as assumptions of how the world works, helps the autonomous agent generalize better and adapt their behavior based on past experience. In this thesis, I demonstrate the use of prior knowledge in three different robotics perception problems. 1. object-centric 3D reconstruction, 2. vision and language for decision-making, and 3. 3D scene understanding. To solve these challenging problems, I propose various sources of prior knowledge including 1. geometry and appearance priors from synthetic data, 2. modularity and semantic map priors and 3. semantic, structural, and contextual priors. I study these priors for solving robotics 3D perception tasks and propose ways to efficiently encode them in deep learning models. Some priors are used to warm-start the network for transfer learning, others are used as hard constraints to restrict the action space of robotics agents. While classical techniques are brittle and fail to generalize to unseen scenarios and data-centric approaches require a large amount of labeled data, this thesis aims to build intelligent agents which require very-less real-world data or data acquired only from simulation to generalize to highly dynamic and cluttered environments in novel simulations (i.e. sim2sim) or real-world unseen environments (i.e. sim2real) for a holistic scene understanding of the 3D world.

cross-modal matching and self-supervised imitation, cross-modal semantic-linguistic attention map transformer, novel octree-based differentiable optimization procedure, (15 more...)

arXiv.org Artificial Intelligence

2405.20364

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.13)
North America > Canada > Quebec > Montreal (0.04)
(12 more...)

Genre:

Research Report > New Finding (1.00)
Overview (0.92)

Industry:

Information Technology > Robotics & Automation (0.50)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Triple 'Strong Accept' for CVPR 2019: Reinforced Cross-Modal Matching & Self-Supervised Imitation…

#artificialintelligenceMar-7-2019, 02:57:28 GMT

The Conference on Computer Vision and Pattern Recognition (CVPR) is one of the world's top computer vision (CV) conferences. CVPR 2019 runs June 15 through June 21 in Long Beach, California, and the list of accepted papers for the prestigious gathering has now been released. A total of 1300 papers were accepted from a record-high 5165 submissions this year, and one standout already garnering attention is Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation. The paper is said to have received all three "Strong Accepts" in the peer review and ranks 1, according to University of California, Santa Barbara NLP Group Director William Wang, who is also one of the paper's authors. The paper proposes a new method for vision-language navigation (VLN) tasks that combines the strengths of both reinforcement learning and self-supervised imitation learning.

artificial intelligence, machine learning, reinforced cross-modal matching, (14 more...)

#artificialintelligence

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.26)
North America > United States > California > Los Angeles County > Long Beach (0.26)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Wang, Xin, Huang, Qiuyuan, Celikyilmaz, Asli, Gao, Jianfeng, Shen, Dinghan, Wang, Yuan-Fang, Wang, William Yang, Zhang, Lei

arXiv.org Artificial IntelligenceNov-25-2018

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments. In this paper, we study how to address three critical challenges for this task: the cross-modal grounding, the ill-posed feedback, and the generalization problems. First, we propose a novel Reinforced Cross-Modal Matching (RCM) approach that enforces cross-modal grounding both locally and globally via reinforcement learning (RL). Particularly, a matching critic is used to provide an intrinsic reward to encourage global matching between instructions and trajectories, and a reasoning navigator is employed to perform cross-modal grounding in the local visual scene. Evaluation on a VLN benchmark dataset shows that our RCM model significantly outperforms existing methods by 10% on SPL and achieves the new state-of-the-art performance. To improve the generalizability of the learned policy, we further introduce a Self-Supervised Imitation Learning (SIL) method to explore unseen environments by imitating its own past, good decisions. We demonstrate that SIL can approximate a better and more efficient policy, which tremendously minimizes the success rate performance gap between seen and unseen environments (from 30.7% to 11.7%).

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

1811.10092

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback